This notebook was prepared with the following environmental settings.
print(R.version)
## _
## platform x86_64-apple-darwin13.4.0
## arch x86_64
## os darwin13.4.0
## system x86_64, darwin13.4.0
## status
## major 3
## minor 3.2
## year 2016
## month 10
## day 31
## svn rev 71607
## language R
## version.string R version 3.3.2 (2016-10-31)
## nickname Sincere Pumpkin Patch
The first step of the report is to use topic modeling on 58 inauguration addresses to get top words used across speeches.
## [1] "1789-04-30" "1793-03-04" "1797-03-04" "1801-03-04" "1805-03-04"
## [6] "1809-03-04" "1813-03-04" "1817-03-04" "1821-03-04" "1825-03-04"
## [11] "1829-03-04" "1833-03-04" "1837-03-04" "1841-03-04" "1845-03-04"
## [16] "1849-03-05" "1853-03-04" "1857-03-04" "1861-03-04" "1865-03-04"
## [21] "1869-03-04" "1873-03-04" "1877-03-05" "1881-03-04" "1885-03-04"
## [26] "1889-03-04" "1893-03-04" "1897-03-04" "1901-03-04" "1905-03-04"
## [31] "1909-03-04" "1913-03-04" "1917-03-04" "1921-03-04" "1925-03-04"
## [36] "1929-03-04" "1933-03-04" "1937-01-20" "1941-01-20" "1945-01-20"
## [41] "1949-01-20" "1953-01-20" "1957-01-21" "1961-01-20" "1965-01-20"
## [46] "1969-01-20" "1973-01-20" "1977-01-20" "1981-01-20" "1985-01-21"
## [51] "1989-01-20" "1993-01-20" "1997-01-20" "2001-01-20" "2005-01-20"
## [56] "2009-01-20" "2013-01-21" "2017-01-20" NA
## Warning in dir.create(out.dir): 'vis' already exists
## Loading required namespace: servr
##
## Attaching package: 'shiny'
## The following object is masked from 'package:qdapRegex':
##
## validate
The above interactive plots are LDavis visulization. Just take a moment and feel free to play with it yourself! (It’s unlikely you’ll ruin anything) On the left hand side, every circle represents for a topic with a label on them, the area of the circle as well as the number on the circle is encoding prevalence of that topic. The “prevalence” here is computed by the total number of tokens from that topic over the total number of tokens within the entire corpus. (i.e. the bigger the circle, the more prevalent that topic is.)
When you select a topic, what’s shown on the right hand side are red bars, which are indications of the total number of times that term appeared in that topic, and the blue bars are the overall frequency of that term within the entire corpus.
Presidential inaugurations are snapshots of U.S. history and tradition. It’s the first opportunity for elected leaders to struct his stuff surrounded by historic venues. We can see from above topic modeling that there’re several topics every presidents would say at their inauguration speeches. They would address the sagging morale and lack of confidence, be frank and honest about the realities of the economy and wars. America, Freedom, Economy, Government, Jobs, Equality, Reform are the core parts that construct every speech.
After getting the big picture of what topics all the 58 inaugurations would include, let’s take individuals into consideration. How’s inauguration change through time? Do inaugural represents the speakers? Take Obama and Trump for example, either president’s inaugural address should have strong emotional response, since Obama became the first ever black president, and Trump is the first president with absolutly no formal political background.
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
scale=c(5,0.5),
max.words=100,
min.freq=1,
random.order=FALSE,
rot.per=0,
use.r.layout=F,
random.color=FALSE,
colors=brewer.pal(9,"Blues"))
library(shiny)
shinyApp(
ui = fluidPage(
fluidRow(style = "padding-bottom: 20px;",
column(4, selectInput('speech1', 'Speech 1',
speeches,
selected=speeches[5])),
column(4, selectInput('speech2', 'Speech 2', speeches,
selected=speeches[9])),
column(4, sliderInput('nwords', 'Number of words', 3,
min = 20, max = 200, value=100, step = 20))
),
fluidRow(
plotOutput('wordclouds', height = "400px")
)
),
server = function(input, output, session) {
# Combine the selected variables into a new data frame
selectedData <- reactive({
list(dtm.term1=ff.dtm$term[ff.dtm$document==as.character(input$speech1)],
dtm.count1=ff.dtm$count[ff.dtm$document==as.character(input$speech1)],
dtm.term2=ff.dtm$term[ff.dtm$document==as.character(input$speech2)],
dtm.count2=ff.dtm$count[ff.dtm$document==as.character(input$speech2)])
})
output$wordclouds <- renderPlot(height = 400, {
par(mfrow=c(1,2), mar = c(0, 0, 3, 0))
wordcloud(selectedData()$dtm.term1,
selectedData()$dtm.count1,
scale=c(4,0.5),
max.words=input$nwords,
min.freq=1,
random.order=FALSE,
rot.per=0,
use.r.layout=FALSE,
random.color=FALSE,
colors=brewer.pal(10,"Blues"),
main=input$speech1)
wordcloud(selectedData()$dtm.term2,
selectedData()$dtm.count2,
scale=c(4,0.5),
max.words=input$nwords,
min.freq=1,
random.order=FALSE,
rot.per=0,
use.r.layout=FALSE,
random.color=FALSE,
colors=brewer.pal(10,"Blues"),
main=input$speech2)
})
},
options = list(height = 600)
)